Here I want to apply the projected neighbors graph visualization to the pancreas dataset that is used in the scVelo demo and compare it to the visualization on the U2OS dataset.
Note: added edge weights function to graphViz function, after running this analysis, so need to update all function calls here to include weighted = TRUE vs. weighted = FALSE to fix errors.
Use the reticulate package to use scVelo from within R:
Extract spliced and unspliced data
Extract PCA coordinates
Filter genes
Downsample cells to make things easier
Normalize for dimensional reduction
## Warning in if (!class(counts) %in% c("dgCMatrix", "dgTMatrix")) {: the condition
## has length > 1 and only the first element will be used
## Converting to sparse matrix ...
## Normalizing matrix with 3696 cells and 8636 genes
Dimensional reduction
Run velocyto on panc data
Scores of observed and projected states in PC space
Graph visualization on subset of cells from PC coordinates
Graph visualization on subset of cells from gene expression
using common.genes (intersect of overdispersed genes, odsGenes, and genes in velocity output (genes with high correlation b/w spliced and unspliced))
Effects of changing k, distance measure, similarity measure, and similarity threshold:
Using PC generated graph
L1 vs L2 as distance measure:
#using k=10, similarity=cosine, threshold=0.25
set.seed(1)
graphViz(observed = curr.scores.cellsub, projected = proj.scores.cellsub,
k = 50, distance_metric = "L1", similarity_metric = "cosine", similarity_threshold = 0.25, weighted = FALSE,
cell.colors = cell.cols.grph, title = "L1 Distance",
plot = TRUE, return_graph = FALSE)
set.seed(1)
graphViz(observed = curr.scores.cellsub, projected = proj.scores.cellsub,
k = 50, distance_metric = "L2", similarity_metric = "cosine", similarity_threshold = 0.25, weighted = FALSE,
cell.colors = cell.cols.grph, title = "L2 Distance",
plot = TRUE, return_graph = FALSE)
Pearson correlation vs Cosine similarity:
set.seed(1)
graphViz(observed = curr.scores.cellsub, projected = proj.scores.cellsub,
k = 10, distance_metric = "L2", similarity_metric = "cosine", similarity_threshold = 0.25, weighted = FALSE,
cell.colors = cell.cols.grph, title = "Cosine Similarity",
plot = TRUE, return_graph = FALSE)
set.seed(1)
graphViz(observed = curr.scores.cellsub, projected = proj.scores.cellsub,
k = 10, distance_metric = "L2", similarity_metric = "pearson", similarity_threshold = -0.5, weighted = FALSE,
cell.colors = cell.cols.grph, title = "Pearson Correlation",
plot = TRUE, return_graph = FALSE)
..looks like correlation is more conservative than cosine similarity.
Number of out edges k:
Similarity threshold:
## [1] "Done finding neighbors"
## [1] "Done making graph"
## [1] "Done finding neighbors"
## [1] "Done making graph"
## [1] "Done finding neighbors"
## [1] "Done making graph"
## [1] "Done finding neighbors"
## [1] "Done making graph"
## [1] "Done finding neighbors"
## [1] "Done making graph"
## [1] "Done finding neighbors"
## [1] "Done making graph"
Cell consistency score: mean correlation b/w cell’s velocity and velocities of nearest neighbors
.. find n nearest neighbors for each cell e.g…
.. calculate consistency score for each cell..
Cell consistency scores on embedding Blue=low, Red=high
Number of out edges k:
Similarity threshold:
Consistency score in FDG compared to PCA and UMAP computed on same cell subset